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Structure of the SARS coronavirus main proteinase 
as an active C 2 crystallographic dimer 


The 34 kDa main proteinase (M pro ) from the severe acute respiratory syndrome 
coronavirus (SARS-CoV) plays an important role in the virus life cycle through 
the specific processing of viral polyproteins. As such, SARS-CoV M pro is a key 
target for the identification of specific inhibitors directed against the SARS 
virus. With a view to facilitating the development of such compounds, crystals 
were obtained of the enzyme at pH 6.5 in the orthorhombic space group P2 1 2 1 2 
that diffract to a resolution of 1.9 A. These crystals contain one monomer per 
asymmetric unit and the biologically active dimer is generated via the 
crystallographic twofold axis. The conformation of the catalytic site indicates 
that the enzyme is active in the crystalline form and thus suitable for structure- 
based inhibition studies. 


1. Introduction 

Severe acute respiratory syndrome (SARS) is a severe form of 
pneumonia. Its transmission pattern, high mortality rate and possible 
re-emergence in the future make SARS a serious threat for which 
neither efficient therapy nor vaccine is currently available. The 
disease is caused by a member of the coronavirus family: the SARS 
coronavirus (SARS-CoV; Fouchier et al., 2003). Following viral entry 
into cells, two polyproteins named ppla and pplab, with molecular 
weights of 486 and 790 kDa, respectively, are synthesized (Rota et al., 
2003). During the viral life cycle, ppla and pplab are processed into 
15 putative non-structural proteins by two viral proteases: the papain¬ 
like protease and the main proteinase M pro (also named the 3C-like 
protease; 3CL pro ; reviewed in Ziebuhr et al., 2000). In SARS-CoV, 
M pro is responsible for the cleavage of 11 sites in the replicase 
polyproteins (Snijder et al, 2003), releasing viral enzymes needed for 
replication, such as the RNA-dependent RNA polymerase and the 
helicase, as well as other accessory proteins and non-structural 
proteins the functions of which are not fully understood. Thus, given 
its pivotal role in the viral life cycle, M pro is an attractive target for the 
development of drugs directed against the SARS virus. Three- 
dimensional structures of M pro enzymes have been reported for 
several coronaviruses including human CoV (HCoV229E; Anand et 
al., 2003), porcine transmissible gastroenteritis virus (TGEV; Anand 
et al., 2002) and SARS-CoV (Yang et al., 2003). In this study, using an 
Escherichia coli overexpression system, we purified the SARS-CoV 
M pro and obtained a novel crystal form at pH 6.5 that diffracts to high 
resolution and contains one monomer per asymmetric unit. The 
active-site residues and the oxyanion hole adopt a functional 
conformation, indicating that this crystal form might be useful for 
structure-based drug design. 


2. Experimental 

2.1. Protein expression and purification 

The DNA fragment encoding the SARS-CoV M pro strain SIN 2774 
(Ruan et al., 2003) was amplified by PCR using Pfu polymerase 
(Stratagene) and cloned into pMAL-c2x (New England Biolabs) 
incorporating the maltose-binding protein (MBP) at the N-terminus 
of SARS-CoV M pro . The forward primer (5'-TACTAATTGAAGGA 
GTTC GGGTTTTAGG A A A AT GG-3') contains an Xmnl site 
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(bold). The reverse primer (5'-AGCCGGATCCTTATTGGAAGGT 
AACACCAG-3') contains a Bu/nHI site (bold) downstream of the 
stop codon TAA. Four additional amino acids (IEGR) were intro¬ 
duced to facilitate the removal of MBP by factor Xa. Transformed 
BL21(DE3) E. coli cells were grown at 310 K in LB media supple¬ 
mented with 0.2% glucose until an OD 6 uonm of 0.6-0.8 was attained. 
IPTG was added to a final concentration of 1 mill and the 
temperature was lowered to 303 K. After 2 h, cells were harvested by 
centrifugation at 8000g for 10 min, resuspended in buffer A (20 mill 
Tris-HCl pH 7.4, 50 mil NaCl, 1 mill EDTA) and lysed by sonication 
for 20 min, followed by centrifugation at 20 OOOg for 20 min at 277 K. 
The supernatant was loaded onto an Econo-column (Bio-Rad) 



( b ) 

Figure 1 

(a) Overall superposition of the C“ traces from the SARS-CoV M pro monomer 
present in our asymmetric unit (coloured red, PDB code 2c3s) with the active 
monomer A of Yang et al. (2003) (shown in blue, PDB code lujl, chain A). A 
residual rotation of 4.5° is needed to then bring the two equivalent monomers B 
into coincidence. The active-site residues of each monomer are represented as 
sticks. ( b ) Detailed view of the active site represented as green sticks (2c3s, this 
work) superimposed onto the active monomer A of SARS-CoV M pro (lujl, chain 
A). The putative hydrogen bonds (dashed lines) formed by the spatially conserved 
water molecule (red sphere) are shown. 


Table 1 

Data-collection and refinement statistics. 


Values in parentheses refer to the highest resolution shell. 


Data-collection statistics 


Space group 

P2i2i2 

Unit-cell parameters (A) 

a = 107.7, b = 44.9, c = 54.2 

Resolution range (A) 

28-1.90 (1.95-1.90) 

Unique reflections 

19895 

Redundancy 

8.0 (5.9) 

Completeness (%) 

97.9 (88.2) 

Ha(I) 

4.6 (2.2) 

merge t (%) 

7.8 (31.4) 

V M (A 3 Da- 1 ) 

1.95 

Refinement statistics 


Rt (%) 

22.5 (27.5) 

Rjree Value (%) 

26.4 (31.2) 

No. of protein atoms 

2302 [301 residues] 

No. of solvent molecules 

211 

No. of reflections in working set 

19880 

No. of reflections in test set 

1077 

Mean temperature factor (A 2 ) 

35.21 

R.m.s.d. bond lengths (A) 

0.006 

R.m.s.d. bond angles (°) 

1.31 

R.m.s.d. dihedral angles (°) 

24.7 

Ramachandran plot 


Most favoured region (%) 

87.7 

Additionally allowed regions (%) 

11.1 

Generously allowed regions (%) 

0.8 

Disallowed regions (%) 

0.4 


t = t R = T.\Fc„ c -FoJ/T.\FoJ- 


packed with amylose resin (New England Biolab) equilibrated with 
buffer A and incubated overnight at 277 K. The fusion protein was 
eluted at 277 K using 20 m M Tris-HCl pH 7.4, 50 m M NaCl, 1 m M 
EDTA, 10 mill maltose and loaded onto a HiPrep 16/10 Q Sepharose 
FF column (Amersham) equilibrated with buffer B (20 mil Tris-HCl 
pH 8.0, 50 m M NaCl, 1 m M EDTA). Proteins were eluted using a 
linear NaCl concentration gradient in buffer C (20 m M Tris-HCl pH 
8.0,1 MNaCl, 1 mMEDTA). Fractions containing MBP-SARS-CoV 
M pl ° were pooled, concentrated by ultrafiltration at 3000g (Centri- 
con, Vivascience) and desalted in 20 mil Tris-HCl pH 7.0, 50 mil 
NaCl, 1 mil CaCl 2 using PD-10 columns (Amersham). One unit of 
factor Xa was added per 142 pg of fusion protein for 6 h at 297 K. 
After cleavage, factor Xa was removed using a resin (Qiagen). 
Cleaved products were loaded onto an XK 16/20 phenyl Sepharose 
resin column (Amersham) equilibrated in buffer D (12.5 mil Tris- 
HCl pH 7.0, 300 mil NaCl, lm M DTT, 0.1 mil EDTA). The 
recombinant SARS-CoV M pro was eluted using buffer E (12.5 mM 
Tris-HCl pH 7.0, 1 m M DTT, 0.1 m M EDTA). Fractions containing 
SARS-CoV M pro were pooled and the buffer changed to 10 m M Tris- 
HCl pH 7.4,1 m M EDTA, 1 mM DTT for concentration to 5 mg ml -1 
as determined using the Bradford method (Bio-Rad) with BSA as a 
standard and stored at 193 K. 

2.2. Crystallization and data collection 

Crystals of SARS-CoV M pro were grown using the hanging-drop 
vapour-diffusion method. Equal volumes (1 pi) of protein and 
mother liquor were mixed over wells containing 0.1 M MES pH 6.5 
and 0.6 M (NH 4 ) 2 S0 4 at 291 K. Macroseeding produced thin elon¬ 
gated plate-like crystals over a period of one week. For data collec¬ 
tion, crystals were soaked in a cryoprotecting solution containing 
30% glycerol, 0.1 M MES, 0.6 M (NH 4 ) 2 S0 4 pH 6.5, before being 
mounted and cooled to 100 K in a nitrogen-gas stream (Oxford 
Cryosystems). Diffraction intensities were recorded at beamline 
ID14-4 at the European Synchrotron Radiation Facility, Grenoble, 
France on an ADSC CCD detector using an attenuated beam of 
0.125 x 0.050 mm. Integration, scaling and merging of the intensities 
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were carried out using programs from the CCP4 suite (Collaborative 
Computational Project, Number 4, 1994). Data-collection and 
refinement statistics are presented in Table 1. 

2.3. Structure determination and refinement 

The structure of SARS-CoV M pro was readily solved by molecular 
replacement using the program AMoRe from the CCP4 suite with the 
SARS-CoV M pro structure deposited as PDB code lujl as a search 
model. The program REFMAC5 was used for refinement cycles, 
which were alternated with rebuilding sessions using the program O 
(Jones et al., 1991). 5% of the reflections were set aside to monitor the 
progress of refinement using the R tree factor. Water molecules, added 
automatically using ARP/wARP (Perrakis et al., 1999), were checked 
by visual inspection. The quality of the model was assessed using 
PROCHECK (Laskowski et al, 1993). Structure superposition was 
performed with LSQKAB (Collaborative Computational Project, 
Number 4, 1994). 

3. Results and discussion 

3.1. Overall structure of SARS-CoV M pro 

The model comprises one monomer per asymmetric unit (residues 
1-301). Five residues from the C-terminus are not visible in the 
electron-density map and have been omitted. Residues 1-101 
(domain I) and 102-184 (domain II) form the chymotrypsin-like 
double-/l-barrel structure which is observed in several viral proteases 
including picornaviruses, togaviruses and flaviviruses (Babe & Craik, 
1997). The C-terminal o'-helical domain (residues 201-301) of SARS- 
CoV M pro is required for activity, since a truncated fragment 
comprising only its catalytic domain displays a significant decrease in 
enzymatic activity (Bacha et al., 2004). Structural and functional 
studies of coronavirus M pro have shown that dimerization is required 
for maximal protease activity. In this respect, a prominent role is 
played by the seven amino-terminal amino acids, which adopt an 
extended conformation making extensive contacts with domain II of 
the other monomer and ensuring the formation of a catalytically 
competent active site (Yang et al, 2003). In our crystal form, the 
active dimer is generated through the crystallographic twofold. No 
contact is established by Seri, which is mobile as shown by a higher 
than average temperature factor. The path of the main chain, 
however, closely follows that observed in previously reported active 
monomers, with an r.m.s deviation of 0.80 A for 300 equivalent main- 
chain atoms (PDB code lujl chain A; Yang et al., 2003) (Fig. la). This 
latter crystal form belongs to space group P2 1 and contains one dimer 
in the asymmetric unit with quasi-twofold symmetry. This indicates 
that the N-terminal residue is not absolutely required for the active 
site to adopt an active conformation. 

A figure showing the distribution of the thermal factors of the 
SARS-CoV main proteinase is available as supplementary material. 1 

1 Supplementary material is available from Crystallography Journals Online 
(Reference: SW5004). 


3.2. Structure of the active site 

The substrate-binding site is located in a cleft between the two 
/1-barrels. The catalytic Cysl45-His41 dyad (with the cysteine thiol 
acting as the nucleophile) is used instead of the classical Ser-His-Asp 
triad of serine proteases (Fig. lb). Although the crystals were 
obtained at pH 6.5, a value which is presumably near the p K a value of 
His residues in the substrate-binding site and where the enzyme 
shows a slightly reduced activity, the conformation of the active site 
indicates an active enzyme (Fig. lb). The immediate vicinity of the 
active site is involved in extensive intermolecular contacts with 
neighbouring molecules. Thus, this crystal form is likely to be more 
suitable for studies involving soaking or co-crystallization of small 
compounds rather than long peptides. Interestingly, during the course 
of preparation and submission of this manuscript, related crystal 
forms of SARS-CoV M pro have been reported by Hsu et al. (2005) 
and by Tan et al. (2005). 
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